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Abstract 

With a rapid development in aerial technology, applications of Remote Sensing Images (RSI) have become 
more diverse. Remote sensing image classification plays a crucial role in analyzing and interpreting Earth 
observation data for various applications, such as land cover mapping, environmental monitoring, and urban 
planning. However, accurately classifying remote sensing images poses significant challenges due to their 
complex spatial and spectral characteristics. In recent years, transfer learning has emerged as a promising 
technique to improve the classification accuracy by leveraging the knowledge learned from pre-trained models 
on large-scale datasets. The proposed model explores different transfer learning strategies employed in 
remote sensing image classification, including fine-tuning, feature extraction, and domain adaptation. It 
discusses popular pre-trained models, such as VGG16, VGG19, and Inceptionv3, and their applicability to 
remote sensing datasets. The advantages and limitations of each strategy are analyzed, providing insights into 
their suitability for various remote sensing applications. A comparative study is done on all these techniques 
to evaluate the performance measures like Accuracy and Loss. 
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1. Introduction 

Remote Deep learning and computer vision are used 
in various applications such as image classification, 
object detection in industrial production, medical 
image analysis, action recognition, and remote 
sensing. They have many applications which 
include hazard response, urban monitoring, traffic 
control and many more. Noise can be removed from 
grayscale and color photographs with a lot of 
techniques. Satellite images are considered the main 
source of acquiring geographic information, and 
there are many applications of satellite image 
analysis in the field of civil engineering such as 
design, construction, urban planning, and water 
resource management. [1] The data obtained from 
satellite sources are huge and are growing 
exponentially; to handle these large data, there is a 
need to have efficient techniques for data extraction 
purpose. Through image classification, these large 
number of satellite images can be arranged in 
semantic orders. The satellite image classification is 
a multilevel process that starts from extracting 
features from images to classifying them into 


categories. Image classification is a step-wise 
process that starts with designing scheme for 
classification of desired images.[3] After that, the 
images are preprocessed which include image 
clustering, image enhancement, scaling, and so on. 
At third step, the desired areas of those images are 
selected and initial clusters are generated. After that, 
the algorithm is applied on the images to get the 
desired classification, and corrective actions are 
made after that algorithm phase which is also called 
postprocessing.[4] The final phase is to assess the 
accuracy of this classification. The algorithms that 
have been effective in natural scene images are not 
adapted to aerial images taken in wide view. 
Convolutional Neural Networks are used based on 
their performance with the natural images. VGG16, 
VGGI19 and Inceptionv3 works by dividing the 
image into several cells through a single network. 
To improve the classification accuracy and reduce 
computation time, the proposed methodology which 
consist of feature extraction models that gives a 
superior accuracy [6]. 
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2. Literature Survey 
Machine Learning has been incorporated in the past 
times for remote sensing image classification. Any 
machine learning algorithm included feature 
extraction, selection and classification. Initially, 
Random Forest has been employed, later 
Convolution Neural Network (CNN) has been used 
due to its feature extraction capability [12]. In order 
to enhance the classification performance feature 
extraction strategy is used [13]. Feature extraction is 
the process of representing the data into numerical 
values, which is a very important step for pattern 
identification and visualization [8]. It is an essential 
step that supports the model in identifying better 
accuracy [5]. The methods that are used to remove 
noise and deal with complicated background images 
include: VGG16, VGG19 and Inceptionv3 with 
adam optimizer used to train the models [2]. The 
image classification methods such as VGGI16, 
VGG19, and InceptionV3 have used for feature 
extraction process. InceptionV3 is a convolutional 
neural network for assisting in image analysis and 
image classification, and got its start as a module for 
GoogleNet. The algorithms VGG16 and VGG19 
employ this strategy. [9]. Remote sensing image 
classification algorithms have been enhanced with 
context enhanced modules. Image Classifiers such 
as VGGI16 and VGG19 uses feature extraction 
process [12]. The VGG16 feature extraction model 
has the capability of extracting a huge amount of 
data and results in good accuracy. It is one of the 
most popular techniques of image feature 
extraction, which performs better if any DL model 
is applied for classification tasks. Hence, the 
VGG16 model has been chosen for feature 
extraction in the proposed work. The VGG16 model 
is implemented on a collection of MRI scans. 
Various layers are utilized in the complete 
architecture of the proposed models' designs to 
extract features. 
3. Methodology 
The following step by step procedure implements 
the proposed model for detecting the objects in 
remote sensing images Show in Figure 1. 

e Data Collection: Data was collected from 
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aerial satellite images dataset. 
Preprocessing: Images have _ been 
preprocessed. Data augmentation is done to 
increase the data for custom object detection. 
Feature Extraction: The features from 
images are extracted using Residual Networks 
101 and Zeiler and Fergus Net. 


Exploring the dataset 


Data Preprocessing 


Data Visualization 


Splitting the data into train and test 


Building the Model- 
VGG16, VGG19, InceptionV3 


Training the model 


Sign up and Sign in 


User input 


Final prediction result 


Figure 1 Flow of proposed system 


e Visual Geometry Group16: VGGI16 is a 


deep convolutional neural network at the 
University of Oxford. VGG16 has 16 weight 
layers,which include 13 convolutional layers 
3 fully connected layers. 

Visual Geometry Group19: VGG19 widely 
used in computer vision tasks in image 
classification it has 19 weight layers,which 
include 16 convolutional layers and 3 fully 
connected layers. 
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InceptionV3: It is a convolutional neural 
network developed by Googles research team 
known as Google Brain Team for image 
classification and recognition tasks. 

4. Implementation 

4.1. A.Dataset 

The RSSCN7 dataset [11] contains a total of 2,800 
remote sensing images which are composed of 
seven scene classes: grass land, forest, farm land, 
parking lot, residential region, industrial region, and 
river/lake. For each class, there are 400 images 
collected from the Google Earth (Google Inc.) that 
are cropped on four different scales with 100 images 
per scale. Each image has a size of 400x400 pixels. 
The main challenge of this dataset comes from the 
scale variations of the images is Figure 2. 


Figure 2 Data Annotation (a)Boundary Box 
(b)Data Labelling 


4.2. B. Algorithms 
INCEPTION V3: It is a convolutional neural 
network architecture that was developed as part of 
the Inception family of models, which was 
introduced by Google researchers in 2014. It 
represents a significant advancement in image 
recognition and classification tasks. Figure 5, It 
consists of multiple layers, including convolutional 
layers, pooling layers, and fully connected 
layers.[10] The architecture is based on the concept 
of inception modules, which allow the network to 
capture features at different scales and abstraction 
levels. Inception 3's unique architecture, 
computational efficiency, and strong performance 
have made it a popular choice for image recognition 
tasks. Its contributions have significantly influenced 
the field of deep learning and continue to inspire 
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further research and development. 

VGG16: VGGI6 is characterized by its deep 
architecture, consisting of 16 layers. It follows a 
sequential structure with 13 convolutional layers, 
each followed by a max pooling layer, and ends with 
three fully connected layers. This depth allows the 
network to learn increasingly complex and abstract 
features from input images. Figure 3 is VGG16 is 
known for its extensive use of 3x3 convolutional 
filters throughout the architecture.[7] By using these 
small-sized filters, the model can capture local 
patterns and details efficiently while still preserving 
spatial information. The repeated use of 3x3 filters 
allows the network to learn _ hierarchical 
representations, enabling it to recognize features at 
different scales. VGG16 has been pretrained on 
large-scale image datasets such as ImageNet, which 
contains millions of labeled images. Pretraining 
involves training the model on a generic image 
classification task, which helps it learn meaningful 
and generalizable features. The pretrained weights 
of VGGI16 can then be used as a starting point for 
transfer learning, where the model is fine-tuned on 
smaller, task-specific datasets. This approach 
enables faster convergence and better performance 
on specific visual recognition tasks, even with 
limited training data. 

VGG19: VGGI19 is an extension of the VGG16 
architecture, featuring 19 layers. Figure 4 is It 
includes 16 convolutional layers, each followed by 
a max pooling layer, and ends with three fully 
connected layers. The additional layers in VGG19 
allow for a deeper representation of features, which 
can capture more complex patterns and hierarchical 
structures in the input images.Similar to VGG16, 
VGG19 employs 3x3 convolutional _ filters 
throughout the network. These small-sized filters 
are used to convolve the input feature maps and 
capture local patterns and details effectively. The 
repeated use of 3x3 filters aids in learning multi- 
scale representations and enables the network to 
recognize features at various levels of abstraction. 
VGGI19, like its predecessor, can be pretrained on 
large-scale image datasets such as ImageNet. The 
pretraining process involves training the model on a 
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large collection of labeled images to learn general 
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Table 2 Performance of VGG19 


image representations. These pretrained weights can _ Fl- 
then be utilized for transfer learning, allowing the Precision | recall sears Support 
model to be fine-tuned on specific tasks or smaller 
datasets. By leveraging pretraining and transfer 0 0.73 0.92 | 0.81 | 12 
learning, VGG19 can benefit from the learned 1 0.82 0.75 | 0.78 | 12 
generic features and achieve better performance and 2 1.00 0.78 |0.88 | 18 
aan fic visual i 
oo on domain-specific visual recognition 3 0.62 033.1071 112 
5. Results 4 0.81 0.81 |0.81 | 16 
When the InceptionV3, VGG16, VGG19 algorithms 5 0.89 0.89 0.89 18 
are used in the process, the images are trained and 
tested, and the accuracies of the 3 algorithms have 6 1.00 0.83 | 0.91 | 12 
been compared show in Figure 6. Accuracy 0.83 | 100 
Performance Measures: 
For analyzing the VGG16 Show in Table 1, VGG19 Macro avg | 0.84 0.83 | 0.83 | 100 
is Table 2 and Inception V3 Model is Table 3, 
accuracy, precision, recall and fl-score are looked Weighted 
over to figure out how well the model works. avg 0.85 0.83 0.83 100 
e Accuracy=no. of correct predictions/Total no. 
OF Dcecnons = ESN Ee Nee Table 3 Performance of Inception V3 
e Recall=no. Correct Actual Positives/ Tot. no. of _ Fl- 
Actual Positives=TP/TP+FN precision | recall | (, | Support 
e Precision= no. of Positive Predictions/Tot. no. 
1. : 82 | 1 
of Positive Predicts =TP/TP+FP i oe ee 
we af 1 0.85 0.85 | 0.85 | 20 
e FlScore=2(Precision * Recall)/ Precision + 
Recall 2 0.60 0.82 |0.69 | 11 
=2((TPTP+FP)*(TPTP+FN))/(TPTP+FP) 3 0.81 0.87 | 0.84 | 15 
aE) 4 0.87 0.93 | 0.90 | 14 
= 2TP2TP+FP+FN 
eS) 0.86 0.86 | 0.86 | 14 
Table 1 Performance of VGG16 6 1.00 0.81 |0.90 | 16 
precis | can | Pl Support Accuracy 0.84 | 100 
ion score 
0.91 10.77 |0.83_ | 13 ere 0686 0.83 | 0.84 | 100 
0.69 |0.64 | 0.67 | 14 avg 
0.73 | 0.92 | 0.81 | 12 Weighted 
0.76 [0.76 [0.76 | 17 ave 0:00 Sen ee ieee 


0.80 | 0.86 | 0.83 | 14 
0.71 | 1.00 | 0.83 | 14 
1.00 |0.75 | 0.86 | 20 
Accuracy 0.80 | 100 
Macro avg 0.80 | 0.81 | 0.80 | 100 
Weighted avg | 0.82 | 0.80 | 0.80 | 100 


Win} BR] WLM] | oO 


The performance of deep learning models for 
remote sensing images have been analyzed. Three 
classification algorithms are trained which achieved 
an accuracy of 94% ,83% and 77% Show in Figure 
7,8,9. 
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5.1. Confusion Matrix of Three Algorithms 5.2. Epoch Result of Three Algorithms 
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Figure 9 Inception V3 


Conclusion 

The proposed system uses the customized data for 
training the model using classification algorithms 
such as Visual Geometry Group 16(VGG16), Visual 
Geometry Group 19(VGG19) and also Inception 
V3. The system also classifies objects of different 
scale variations from aerial images using the above- 
mentioned algorithms. The accuracy obtained by 
VGGI16 is 77%, VGG19 is 83% and for Inception 
V3 is 94%. Inception V3 has given better results 
when compared to VGG16 and VGG19. The future 
enhancement of the project would be the 
incorporation of detecting the small objects 
effectively and transfer learning for the algorithms. 
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